Skip to content

fix(gossipsub): remove closed outbound streams from the registry#3531

Merged
tabcat merged 4 commits into
libp2p:mainfrom
tabcat:fix/gossipsub-stale-outbound-stream
Jun 6, 2026
Merged

fix(gossipsub): remove closed outbound streams from the registry#3531
tabcat merged 4 commits into
libp2p:mainfrom
tabcat:fix/gossipsub-stale-outbound-stream

Conversation

@tabcat

@tabcat tabcat commented Jun 4, 2026

Copy link
Copy Markdown
Member

Description

When an outbound stream closed (reset, transport drop, remote close), the closed stream was left in streamsOutbound. The createOutboundStream guard then short-circuited on the stale entry, so when the peer reconnected no new outbound stream was created and every further outbound message to that peer failed.

Remove an outbound stream from the registry when it closes, so a new one is created the next time the peer (re)connects.

Related #2771.
References ChainSafe/js-libp2p-gossipsub#534, which fixes the same issue in the pre-migration repository.

Notes & open questions

Scope: this restores the stream when the peer reconnects, which is the reported case. Re-establishing a stream that closes while its connection stays open, where the remote never re-opens, is left as a follow-up.

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if necessary (this includes comments as well)
  • I have added tests that prove my fix is effective or that my feature works

tabcat added 2 commits June 4, 2026 19:00
A closed outbound stream was left in `streamsOutbound`, which blocked
every further outbound message to that peer for the life of the
connection. Remove it when it closes so a new stream can be created.
Reconcile outbound streams each heartbeat so a connected peer regains an
outbound stream after the previous one closed on a still-open connection.
@tabcat tabcat requested a review from a team as a code owner June 4, 2026 12:43
@tabcat tabcat marked this pull request as draft June 4, 2026 16:41
@tabcat

tabcat commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

Going to remove the heartbeat reconnect. Feeling its too aggressive. Could be good to have in the future.

@tabcat tabcat changed the title fix(gossipsub): recreate outbound streams closed on live connections fix(gossipsub): remove closed outbound streams from the registry Jun 5, 2026
@tabcat tabcat marked this pull request as ready for review June 5, 2026 11:04
@tabcat tabcat merged commit 7ae12f9 into libp2p:main Jun 6, 2026
47 of 48 checks passed
@tabcat tabcat mentioned this pull request Jun 6, 2026
@larrype8

Copy link
Copy Markdown

I am curious if there will be any ongoing work related to the scope that was not performed in this PR, "Scope: this restores the stream when the peer reconnects, which is the reported case. Re-establishing a stream that closes while its connection stays open, where the remote never re-opens, is left as a follow-up." I was working on a patch for gossipsub not handling the stream close event and having a dead stream in the list and found your fix for that which i have incorporated. I was also working toward not long handling the error but actively re-establishing the stream or disconnecting the stream. I am not sure waiting for he peer to reconnect should be the only strategy but I am not sure about that. Could you elaborate on any future plans to work on active re-establishment of the failed stream?

@tabcat

tabcat commented Jun 28, 2026

Copy link
Copy Markdown
Member Author

Hey @larrype8, I'm not convinced active re-dialing in gossipsub is necessary yet. Most of the problem this PR is solving is a case where a device goes offline, comes back sometime later, and then tries to dial the same gossipsub peer again (so we know the gossipsub peer is already being dialed). But if you experience a problem in implementation, please do create an issue. Thanks

@larrype8

Copy link
Copy Markdown

Thanks for the reply. The main problem we experienced was the one you fixed in this PR. I will monitor it and log an issue if we experience a problem getting the peer reconnected with the exact condition because I am also not sure how it will behave with this fix in place. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants